Chapter 19: Inference for a Single Mean

Overview

  • 19.1 Bootstrap confidence interval for a mean
    • 19.1.1 Observed data
    • 19.1.2 Variability of the statistic
    • 19.1.3 Bootstrap SE confidence interval
    • 19.1.4 Bootstrap percentile confidence interval for a standard deviation
    • 19.1.5 Bootstrapping is not a solution to small sample sizes!
  • 19.2 Mathematical model for a mean
    • 19.2.1 Mathematical distribution of the sample mean
    • 19.2.2 Evaluating the two conditions required for modeling \(\bar{x}\)
    • 19.2.3 Introducing the t-distribution
    • 19.2.4 One sample t-intervals
    • 19.2.5 One sample t-tests

19.1 Bootstrap confidence interval for a mean

  • We use bootstrapping to estimate the sampling distribution of a statistic.
  • The process of bootstrapping for a sample mean is the same as bootstrapping for a sample proportion.
  • Since our data is numeric, we calculate the sample mean \(\bar{x}\), instead of the sample proportion \(\hat{p}\).

19.1.1 Observed data

19.1.2 Variability of the statistic

19.1.3 Bootstrap SE confidence interval

Once we have a bootstrapped sampling distribution, we can use it to make confidence intervals.

  • The 95% percentile confidence interval is the the interval from the 2.5%ile to the 97.5%ile.
  • The 95% SE confidence interval is the quick approximation \(\mbox{point estimate} \pm 2 \cdot SE_{BS}\)
    • We are using 2 instead of 1.96.

19.1.5 Bootstrapping is not a solution to small sample sizes!

  • Note: Bootstrapping (and other statistical models) work best for larger samples.
  • The car price example had a sample size of \(n=5\), which is pretty small.

Group Activity

price
18300
20100
9600
10700
27000
  1. Open the Bootstrapping Applet. Clear the preloaded data and paste in the above price data (including the header).

  2. Check Show Sampling Options and make a bootstrap distribution of sample means from this sample.

  3. Use the SD from the distribution to calculate a 95% SE confidence interval for the mean price.

  4. Select Beyond from the count samples dropdown. Use trial and error to find the upper endpoint of a 95% percentile confidence interval.

19.2 Mathematical model for a mean

19.2.1 Mathematical distribution of the sample mean

Central Limit Theorem for the sample mean. When we collect a sufficiently large sample of \(n\) independent observations from a population with mean \(\mu\) and standard deviation \(\sigma\), the sampling distribution of \(\bar{x}\) will be nearly normal with mean \(\mu\) and standard deviation \(SE = \sigma/\sqrt{n}\).

  • In practice, we don’t usually know \(\sigma\), so we use \(SE \approx s/\sqrt{n}\).

19.2.2 Evaluating the two conditions required for modeling \(\bar{x}\)

  • Independence. e.g., the sample is a simple random sample from the population.
  • Normality. If the sample is small, then the data should come from a normal, or nearly normal, population.
    • “small” means less than 15
    • “large” means greater than 60
    • Extreme outliers can cause problems.

19.2.3 Introducing the t-distribution

  • For smaller samples, the sampling distribution of \(\bar{x}\) has thicker tails than the normal distribution.
  • A better approximation is the \(t-distribution\).
    • As with the chi-squared distribution, the t-distribution requires you to specify the degrees of freedom.
    • The sampling distribution of \(\bar{x}\) has \(n-1\) degrees of freedom, where \(n\) is the sample size.

Tail probabilities of the t-distribution in R

As with pnorm and pchisq, you can get lower tail probabilities in R using pt.

pt(-2.5, df = 19)  # lower tail below -2.5
[1] 0.01087021
1 - pt(2.5, df = 19) # upper tail above 2.5
[1] 0.01087021

Quantiles of the t-distribution in R

If you know the tail probability you want, you can get the corresponding t.

qt(0.025, df = 11)
[1] -2.200985
qt(0.975, df = 11)
[1] 2.200985

Recall that for the normal distribution, these values were \(\pm 1.96\) (fatter tails).

19.2.4 One sample t-intervals

The generic formulas for confidence intervals work for t-intervals:

\[ \begin{aligned} \text{point estimate} \ &\pm\ t^{\star}_{df} \times SE \\ \bar{x} \ &\pm\ t^{\star}_{df} \times \frac{s}{\sqrt{n}} \end{aligned} \]

To get \(t^\star_{df}\) you can use qt in R and you need the degrees of freedom (\(n-1\)).

19.2.5 One sample t-tests

\[ H_0: \mu = \text{null value} \\ H_A: \mu \neq \text{null value} \]

The generic \(Z\)-score formula gives us a formula for \(T\)-score:

\[ T = \frac{\bar{x} - \mbox{null value}}{s/\sqrt{n}} \]

Group Activity

A sample of 14 frogs has a mean weight of 26.45 grams with a sample standard deviation of 3.98.

  1. Construct a 95% confidence interval for the mean weight of the frogs in this population. (Use qt to determine \(t^\star_{df}\), and then use the t-interval formula.)

  2. Is the mean weight of the frogs in this population significantly different from the Northern Leopard Frog, which has a mean weight of 24 grams? Obtain a P-value for the following hypothesis test: \[ H_0: \mu = 24 \\ H_A: \mu \neq 24 \] (Use the \(T\)-score formula, and then use pt to get a P-value.)

t-tests in R

The frogs vector contains the weights of the 14 frogs in our sample.

t.test(frogs, mu = 24)

    One Sample t-test

data:  frogs
t = 2.3008, df = 13, p-value = 0.0386
alternative hypothesis: true mean is not equal to 24
95 percent confidence interval:
 24.14950 28.74841
sample estimates:
mean of x 
 26.44895